Reinforcement learning is used to learn complex tempral and situational behaviours, when there is either no labelled training data or it is insufficient. Often the point at which rewards occur are well after the action that caused them and may be the result of several past actions leading to credit assignment problems, which make positive or negative reinforcement hard.
Reinforcement learning takes place due to interactions with a real or simulated world and therefore have a cost both directly due to the action being performed (energy expenditure for a robot, network costs for a web agent) and indirecly due to the positive or negative effects of the action. However, without taking actions there is no potential for learning, this leads to an exploration-exploitation trade-off.
Used in Chap. 6: page 83; Chap. 16: pages 236, 241, 242, 248, 249; Chap. 22: page 352
Also known as reinforcement function, reinforcement learner